8 research outputs found

    A supervised approach for intra-/inter-community interaction prediction in dynamic social networks

    Get PDF
    Due to the growing availability of Internet services in the last decade, the interactions between people became more and more easy to establish. For example, we can have an intercontinental job interview, or we can send real-time multimedia content to any friend of us just owning a smartphone. All this kind of human activities generates digital footprints, that describe a complex, rapidly evolving, network structures. In such dynamic scenario, one of the most challenging tasks involves the prediction of future interactions between couples of actors (i.e., users in online social networks, researchers in collaboration networks). In this paper, we approach such problem by leveraging networks dynamics: to this extent, we propose a supervised learning approach which exploits features computed by time-aware forecasts of topological measures calculated between node pairs. Moreover, since real social networks are generally composed by weakly connected modules, we instantiate the interaction prediction problem in two disjoint applicative scenarios: intra-community and inter-community link prediction. Experimental results on real time-stamped networks show how our approach is able to reach high accuracy. Furthermore, we analyze the performances of our methodology when varying the typologies of features, community discovery algorithms and forecast methods

    Measuring objective and subjective well-being: dimensions and data sources

    Get PDF
    AbstractWell-being is an important value for people's lives, and it could be considered as an index of societal progress. Researchers have suggested two main approaches for the overall measurement of well-being, the objective and the subjective well-being. Both approaches, as well as their relevant dimensions, have been traditionally captured with surveys. During the last decades, new data sources have been suggested as an alternative or complement to traditional data. This paper aims to present the theoretical background of well-being, by distinguishing between objective and subjective approaches, their relevant dimensions, the new data sources used for their measurement and relevant studies. We also intend to shed light on still barely unexplored dimensions and data sources that could potentially contribute as a key for public policing and social development

    Big Data Analytics for Nowcasting and Forecasting Social Phenomena

    No full text
    One of the most pressing, and fascinating challenges of our time is understanding the complexity of the global interconnected society we inhabit. This connectedness reveals in many phenomena: in the rapid growth of the Internet and Web, in the ease with which global communication and trade now takes place, and in the ability of news and information as well as epidemics, trends, financial crises and social unrest to spread around the world with surprising speed and intensity. Ours is also a time of opportunity to observe and measure how our society intimately works: Big Data originating from the digital breadcrumbs of human activities promise to let us scrutinize the ground truth of individual and collective behavior at an unprecedented detail in real time. Multiple dimensions of our social life have Big Data proxies nowadays. We can use Big Data, as signals, as proxies for forecast and nowcast different phenomena, and even more social phenomena. We can manage to describe and predict how humans and society works. We can use geolocated data to observe and measure the behavior of a population, to build better cities tailored to the movement of the population, with lower commuting times and lower pollution. We can exploit medical data to build classifiers able to help in diagnosing and curing diseases. We can use industrial data to improve the production processes, and create smarter and more secure factories. We can do a lot of other incredible and useful things with the support of data and analytical tools able to extract useful knowledge from raw data. In this thesis we introduce data-driven as well as model-driven approaches to predict different phenomena, from epidemics to socio-economic attraction. We use Big Data deriving from our everyday life as external proxies to nowcast and forecast the evolution of phenomena whose study relies only on historical data or data that come only with a significant lag. We use supermarket retail data as an external signal in order to predict the curve of an internal time series, the influenza one. When the flu season arrives, people are starting to get sick. Getting sick affects their everyday life and behavior. This change in behavior should propagate in their purchases in the supermarket. So they will buy products that will reflect the fact that they are sick. We also study human movements that are inherently massive, dynamical, and complex. But understanding the individual mobility patterns, could be of such a fundamental importance for so many different phenomena. We decided to exploit these patterns in order to study and predict the attraction of different socio-economic factors of human environment. In our first approach we study the distribution of the travelling sub-populations in Tuscany region in Italy, to the airports of the region and we built a dynamic model for the interplay of attraction of availability of air travel and an airport’s popularity among the population. Based on this model, we forecast the future evolution of the airports in the region. In our second approach, we identifiy and categorize industrial clusters in Veneto region in Italy, by size and population dynamics and measured their attraction. We create a real-time system which help us to feel the pulse of a city, and predict the rise of new industrial clusters or the death of existing ones. Finally, we attempt prediction in social networks, introducing the interaction prediction problem, trying to predict intra-community interactions, interactions that may occur in the interior of the same community, and we applied the same approach to predict inter-community interactions, the weak links that keep together the modular structure composing complex networks

    Automatic Selection of Tags for Photos Based on their Geographical Position

    No full text
    124 σ.Η ολοένα και ταχύτερα αναπτυσσόμενη τεχνολογία στον τομέα των πολυμέσων την τελευταία δεκαετία έχει ομολογουμένως καταστήσει εξαιρετικά απλή τη λήψη και την αποθήκευση φωτογραφικού υλικού σε ψηφιακές βιβλιοθήκες. Η δημιουργία σημασιολογικών μεταδεδομένων για το περιεχόμενο των φωτογραφιών στα πλαίσια αυτών των συλλογών φωτογραφιών παραμένει ένας σημαντικός στόχος. Μεγάλο μέρος του σχολιασμού των φωτογραφιών, με την προσθήκη ετικετών μπορεί να βελτιώσει σημαντικά τη χρησιμότητα αυτών των συλλογών. Αυτή την περιοχή προσπαθεί να ενισχύσει η συγκεκριμένη διπλωματική εργασία, η οποία επικεντρώνεται στο Flickr, που αποτελεί μία ενεργή διαδικτυακή υπηρεσία διαμοιρασμού φωτογραφιών. Εφαρμογές όπως το Flickr παρέχουν μια μεγάλη συλλογή από φωτογραφίες οι οποίες χαρακτηρίζονται από την γεωγραφική θέση όπου έγινε η λήψη της φωτογραφίας και από ένα σύνολο από ετικέτες (tags) που περιγράφουν το περιεχόμενο των φωτογραφιών. Με την βοήθεια τεχνολογιών όπως WIFI και GPS οι σύγχρονες φωτογραφικές μηχανές μπορούν αυτόματα να προσδιορίσουν την γεωγραφική θέση όπου έγινε η λήψη μιας φωτογραφίας. Σκοπός αυτής της διπλωματικής εργασίας είναι η δημιουργία μιας εφαρμογής η οποία θα συνδυάζει αυτή την πληροφορία με την πληροφορία που διαθέτουν οι φωτογραφίες, ώστε να βελτιωθεί η περιγραφή τους. Η εφαρμογή αυτή αρχικά θα συνδεθεί με το Flickr για το σκοπό της δημιουργίας βάσεων δεδομένων, που θα περιέχουν στοιχεία για τις φωτογραφίες και τις ετικέτες τους. Στη συνέχεια θα κατηγοριοποιεί τις ετικέτες με τη βοήθεια του WordNet, που αποτελεί μία μεγάλη λεξιλογική βάση δεδομένων της Αγγλικής γλώσσας. Έπειτα, θα παίρνει τα δεδομένα της φωτογραφίας του χρήστη καθώς και μία ετικέτα, και θα υπολογίζει έναν δείκτη επιρροής για κάθε υποψήφια ετικέτα, ο οποίος θα συνυπολογίζει την συνεμφάνιση των ετικετών με την απόσταση των φωτογραφιών. Τέλος θα παρέχει στο χρήστη ένα Web γραφικό περιβάλλον όπου θα του παρουσιάζονται οι προτεινόμενες ετικέτες, δηλαδή οι ετικέτες με το μεγαλύτερο δείκτη επιρροής. Στη διαδικασία του σχεδιασμού και τις υλοποίησης, αξιοποιήθηκαν σε ένα μεγάλο βαθμό οι δυνατότητες που προσφέρει το εξαιρετικά οργανωμένο και λειτουργικό Flickr API καθώς και τα APIς των WordNet και Google Maps. Χρησιμοποιήθηκαν δημοφιλή εργαλεία και πλατφόρμες ανάπτυξης, όπως η γλώσσα προγραμματισμού Java, η πλατφόρμα σχεδίασης Web εφαρμογών HTML/CSS/JavaScript/JQuery και η σχεσιακή βάση MySQL με στόχο η εφαρμογή να είναι αυτόνομη αλλά να μπορεί να εισαχθεί και σε κάποια άλλη πλατφόρμα.Innovations in consumer photography have made it exceedingly simple for people to capture images and store them into digital libraries. The creation of semantic metadata about photo content remains an elusive goal. Some amount of annotation can significantly improve the usefulness of such photo collections. This area seeks to strengthen the specific thesis, which focuses on Flickr, an active online photo sharing service. Applications like Flickr provide a great collection of photos which are characterized by the location where the photo was taken, and a set of tags that describe the content of the photos. With technologies such as WIFI and GPS, the modern cameras can automatically determine the geographical location where the photo was taken. The purpose of this thesis is the creation of an application that combines this information with the information that the photos have, in order to improve their description. This application will initially connect to the Flickr with purpose to create databases, which will contain data for the photos and their tags. Then, it will categorize the tags using WordNet, which is a large lexical database of English. Thereafter it will take the photo data from the user and a tag, and it will calculate an influence score for every candidate tag, a score that will take into account the tag co-occurrence as well as the distance between the photos. Finally it will provide to the user a web graphical interface where the recommended tags will be presented, that are the tags with the highest influence score. During the process of planning and implementation, the potential of the highly organized and functional Flickr API, were used to a large extent, as well as those of the APIs of WordNet and Google Maps. Popular tools and development platforms were used, such as the programming language JAVA, the web applications design platform HTML / CSS / JavaScript / JQuery and MySQL relational database so that the application can be autonomous but still able to be inserted in another platform.Μήλιου Χ. Ιωάνν

    Glacier : guided locally constrained counterfactual explanations for time series classification

    No full text
    In machine learning applications, there is a need to obtain predictive models of high performance and, most importantly, to allow end-users and practitioners to understand and act on their predictions. One way to obtain such understanding is via counterfactuals, that provide sample-based explanations in the form of recommendations on which features need to be modified from a test example so that the classification outcome of a given classifier changes from an undesired outcome to a desired one. This paper focuses on the domain of time series classification, more specifically, on defining counterfactual explanations for univariate time series. We propose Glacier, a model-agnostic method for generating locally-constrained counterfactual explanations for time series classification using gradient search either on the original space or on a latent space that is learned through an auto-encoder. An additional flexibility of our method is the inclusion of constraints on the counterfactual generation process that favour applying changes to particular time series points or segments while discouraging changing others. The main purpose of these constraints is to ensure more reliable counterfactuals, while increasing the efficiency of the counterfactual generation process. Two particular types of constraints are considered, i.e., example-specific constraints and global constraints. We conduct extensive experiments on 40 datasets from the UCR archive, comparing different instantiations of Glacier against three competitors. Our findings suggest that Glacier outperforms the three competitors in terms of two common metrics for counterfactuals, i.e., proximity and compactness. Moreover, Glacier obtains comparable counterfactual validity compared to the best of the three competitors. Finally, when comparing the unconstrained variant of Glacier to the constraint-based variants, we conclude that the inclusion of example-specific and global constraints yields a good performance while demonstrating the trade-off between the different metrics. © The Author(s) 2024.This work was funded in part by the Digital Futures cross-disciplinary research centre in Sweden, and the EXTREMUM collaborative project ( https://datascience.dsv.su.se/projects/extremum.html ).</p

    A Remark on Concept Drift for Dependent Data

    No full text
    Hinder F, Vaquet V, Hammer B. A Remark on Concept Drift for Dependent Data. In: Miliou I, Piatkowski N, Papapetrou P, eds. Advances in Intelligent Data Analysis XXII. 22nd International Symposium on Intelligent Data Analysis, IDA 2024, Stockholm, Sweden, April 24–26, 2024, Proceedings, Part I. Lecture Notes in Computer Science. Cham: Springer Nature Switzerland; 2024: 77-89.Concept drift, i.e., the change of the data generating distribution, can render machine learning models inaccurate. Several works address the phenomenon of concept drift in the streaming context usually assuming that consecutive data points are independent of each other. To generalize to dependent data, many authors link the notion of concept drift to time series. In this work, we show that the temporal dependencies are strongly influencing the sampling process. Thus, the used definitions need major modifications. In particular, we show that the notion of stationarity is not suited for this setup and discuss an alternative we refer to as consistency. We demonstrate that consistency better describes the observable learning behavior in numerical experiments
    corecore